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Abstract — In smart grid, a home appliance can adjust its 
power consumption level according to the realtime power price 
obtained from communication channels. Most studies on smart 
grid do not consider the cost of communications which cannot 
be ignored in many situations. Therefore, the total cost in smart 
grid should be jointly optimized with the communication cost. 
In this paper, a probabilistic mechanism of locational margin 
price (LMP) is applied and a model for the stochastic evolution 
of the underlying load which determines the power price is 
proposed. Based on this framework of power price, the problem 
of determining when to inquire the power price is formulated 
as a Markov decision process and the corresponding elements, 
namely the action space, system state and reward function, are 
defined. Dynamic programming is then applied to obtain the 
optimal strategy. A simpler myopic approach is proposed by 
comparing the cost of communications and the penalty incurred 
by using the old value of power price. Numerical results show 
the significant performance gain of the optimal strategy of price 
inquiry, as well as the near-optimality of the myopic approach. 



I. Introduction 

In recent years, smart grid [1] [6] [10] has attracted signif- 
icant attention in the field of power systems, communications 
and networking. In a smart grid, power is delivered from 
power suppliers to home appliances with the aid of two-way 
communications. The price of power changes with time, sub- 
ject to many random factors like congestion level and power 
generation. Home appliances can inquire the instantaneous 
price and decide the consumption level of power. For example, 
at midnight, the power load is usually low and thus the price 
is relatively low; therefore, air conditioner may set a higher 
temperature (suppose that it is in the winter) and enjoy the 
low power price. 

Existing studies usually pay attention to only the cost of 
power consumption and ignore the cost of communications. 
However, communications for inquiring the price in smart grid 
could incur a nonnegligible cost. Therefore, it is necessary to 
consider the cost of communications and study the optimal 
policy of requesting communications for power price inquiry, 
such that the total cost of power consumption and communi- 
cations is minimized under certain constraints. To the authors' 
best knowledge, no existing work has incorporated the cost of 
communication into the decision procedure in smart grid. 

Due to the cost of communication, it may not be optimal to 
inquire the power price frequently. If the power price changes 
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Fig. 1: The simplified model of smart grid. 



slowly, it may be better to use old power price to optimize 
the power consumption level. However, using old power price 
is also risky. For example, when the old power price is much 
higher than the current power price, the home appliance may 
use a lower power consumption, thus wasting the opportunity 
of low power price if it still uses the old price and does 
not inquire the new one. When the old power price is much 
lower than the current price, the policy of using old price 
will result in a high power consumption level, thus incurring 
significant penalty. Therefore, an optimal tradeoff should be 
found between the cost of communications and the penalty 
incurred by using the old value of power price. 

In this paper, we study the decision problem of communica- 
tion for inquiring the power price in order to minimize the total 
cost. A key issue is the prediction of power price based on the 
current obtained price. A model called probabilistic locational 
marginal pricing (LMP) forecasting [4] [5] is applied to 
predict the distribution of the power price at a given future 
time. In this model, a curve is used to map from the true value 
of power load to the power price. A probabilistic model similar 
to Brownian motion is used to describe the distribution of load 
as a functional of the elapsed time since the latest inquiry 
of the power price. Once the evolution law of the power 
price (load) is known, we convert the problem into a Markov 
decision process (MDP) and obtain the corresponding state 
transition probabilities. Then, we apply dynamic programming 
to compute the optimal strategy. To simplify the strategy, we 
also propose a myopic strategy which compares the cost of 
communication with the penalty incurred by using the old 
power price. Note that we assume that the dynamics of the 
power price are perfectly known. For practical case, when 
there is no perfect model for the power price, we can apply 
the approach of reinforcement learning to learn the optimal 
strategy, which is beyond the scope of this paper. 

The remainder of the paper is organized as follows. The 
system model and the price forecasting are introduced in 
Section [II] The optimal communication policy is discussed 
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in Section [TTT] Numerical results are provided in Section [IV] 
while conclusions are drawn in Section [V] 

II. System Model and LMP Forecasting 

In this section, we first introduce the system model used in 
this paper. Then, we give a brief introduction to the pricing 
mechanism in the power grid, namely the LMP forecasting 
mechanism. 

A. System Model 

Practical power grid is very complicated. To simplify the 
analysis and facilitate the analytical discussion, we consider 
a simple supplier-home model, as illustrated in Fig. [1] In this 
model, we consider only one power supplier and one home 
appliance. The power supplier provides power for the home 
appliance via power line, as well as the power price via a 
communication channel. The communication channel could be 
over Internet, wireless networks or power line communication 
systems. The home appliance inquires the power price via the 
communication channel. The following assumptions are used 
throughout the paper: 

• We ignore possible errors over the communication chan- 
nel and assume perfect communications. We do not 
consider the details of communications like modulation 
and coding. 

• Time is divided into time slots. At the beginning of each 
time slot, the power supplier adjusts its power price. 
Then, the home appliance can inquire the power price and 
receive the price information without or with delay. Once 
the price is obtained, the home appliance determines its 
power consumption level. These events are illustrated in 
Fig.0 

• For simplicity, we assume that the appliance can adjust 
its power consumption according to the power price 
immediately. In practice, there could be a delay for the 
power consumption adjustment, e.g. it needs some time 
to start the air conditioner. The corresponding analysis 
will be more complicated and is beyond the scope of this 
paper. 

We denote by p t the power price and x t the power 
consumption at the t-th time slot. The utility function of 
the power consumption is denoted by U(xt), i.e. the home 
receives reward U(xt) when the power consumption is xt. For 
simplicity, we assume that the utility function does not change 
with time. The cost for one communication effort is denoted 
by c, which is assumed to be a constant. We denote by I t the 
event that the home inquires the power price at the t-th time 
slot, i.e. It = 1 if it inquires and I t = otherwise. We denote 
by r t the latest time slot before time slot t + 1 in which the 
home inquired the power price. The home appliance uses the 
same power price since the previous price inquiry, namely p Tt 
at time slot t. Therefore, the decision of power consumption is 
based on the power price of the previous price inquiry, i.e. xt 
is a function of p Tt . For simplicity, we assume that the power 
consumption level maximizes the net reward and ignores the 
communication cost, i.e. 
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Fig. 2: Events within a time slot. 
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Fig. 3: The dependency of factors in the supply chain. 

where p is the price assumed by the home appliance (may be 
different from the true value if the home appliance does not 
inquire the power price). We assume that U is an increasing 
and strictly concave function (thus the first order derivative is 
strictly decreasing). We also assume that U is continuously 
differentiable and its first order derivative, denoted by U 
ranges from oo to 0. Therefore, the optimal value of the power 
consumption level is given by 

x t (p) = U- 1 (p), (2) 

which is derived from the first order condition, i.e. 

U(x)-p = 0. (3) 

Since U ranges from oo to and is continuous and strictly 
decreasing, there exists a unique solution to (0. Hence, we 
have a one-to-one mapping between the price and the optimal 
power consumption level. 

Although there are some simplifications in these quantities, 
we can obtain the insight from the simplified model and extend 
them to more generous case in the future. Based on the above 
definitions, we assume that the total reward of the home 
appliance is the discounted sum of the rewards in different 
time slots, which is given by 



R=Y^P t (U(x t )-p Tt x t -cI t ) 

t=0 

where (3 is the discount factor. 



(4) 



Xt{p) = argmax(J7(a;) — px) 



(1) 



B. LMP Forecasting 

Power price is usually determined by LMP methodology, 
which is actually driven by the time-varying load, as illustrated 
in Fig. [3] The mapping between load and LMP is typically 
obtained from a constrained optimization problem [9]. In 
practice, the mapping can be represented by a piecewise curve, 
as illustrated in Fig. [4] Therefore, the uncertainty of the power 
price is from that of load. We notice that, in Fig. [4] the number 
of possible prices is finite. Therefore, we denote by K the 
total number of possible prices and by qi, 52, qn the 
corresponding prices. Meanwhile, we denote by J\, Jk 



Fig. 5: The diagram of state transition. 
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Fig. 4: An illustration of the mapping between load and LMP. 

the load intervals corresponding to the prices. Given a price 
qi, we assume that the load is uniformly distributed within the 
interval J^. 

Actually the load is a random variable. Typically, it is 
modeled as a Gaussian random variable [4], i.e. 



D t ^ N (/it, of ), 



(5) 



where D t is the load at time slot t, /i t and at are the corre- 
sponding expectation and variance. Note that the assumption 
of Gaussian random variable is an approximation since a 
Gaussian random variable could be negative while a negative 
price is nonsense. To make it mathematically rigorous, we 
modify the probability density function (PDF) of D t as 



/(A) 



exp 



/o ^exp(-^)<V 



(6) 



where D max is the maximal possible load. 

To study the optimal inquiry time for communications, we 
need to model the relationship between the time interval and 
the Gaussian distribution parameters. Suppose that, at time slot 
0, the true value of the load, Do, is known and then there is 
no observation on the true value of the load. Then, we use the 
following assumptions for the load distribution at time slot t: 

« The expectation fit equals Do, i.e. the expectation does 
not change with time, which represents an unbiased price 
prediction. 

• The variance <j t equals a t — Ot, i.e. the variance increases 
linearly with the time gap, similarly to a Brownian 
motion. The parameter 8 can be estimated from historical 
data. The rationale behind the Brownian motion like 
variance is that Brownian motion is widely used to drive 
the price fluctuation, e.g. stochastic differential equation, 
in the area of financial analysis [8]. Therefore, we also 
use a linearly increasing variance to model the increasing 
uncertainty with time, which stimulates the price inquiry 
over the communication channel. 

III. Optimal Policy for Price Inquiry 

In this section, we study the optimal policy for price inquiry 
over the communication channel. We can model the decisions 



of price inquiry as a Markov decision process (MDP) [3]. We 
will discuss the fundamental elements in the MDP. Then, we 
apply the approach of value iteration to obtain the optimal 
policy. Note that we assume that the statistical laws of the 
price, like the mapping between the load and the price and the 
uncertainty on the load, are all known to the home appliance. 

A. MDP Modeling 

There are three elements in a MDP problem, namely action 
space, system state and reward. We discuss them in the context 
of price inquiry separately. 

1) Action: Obviously, the action of the home appliance is 
the inquiry of the power price (denoted by 1 ) or not (denoted 
by 0). The decision of action is determined by the current 
system state and the policy of the home appliance. Actually, 
there is an implicit action for the home appliance, i.e. the 
power consumption level. In the most complicated case, the 
power consumption level should also be a function of the 
current system state and policy. However, to simplify the 
analysis, we assume that the home appliance assumes the 
power price of the latest inquiry, thus uniquely determines the 
power consumption level. Therefore, we do not consider the 
power consumption level as an action and do not incorporate 
it into the decision policy. 

We put an upper bound, denoted by T, for the number 
of time slots between two price inquiries. Then, the home 
appliance must send out a price inquiry within T time slots 
since the previous price inquiry. The upper bound can address 
the jeopardy brought by modeling imperfection, e.g. some 
imprecise parameters may significantly lengthen the interval 
between two inquiries to a harmful level, and thus improves 
the robustness. 

2) System State: The system state contains two parts, 
namely the power price in the previous inquiry and the elapsed 
time since the previous inquiry. For time slot t, the system state 
is given by (p Tt ,t — r t ) . Obviously, if the home appliance 
inquires the power price at time slot t, the corresponding 
system state is (f>t,0). An illustration for the state transition 
diagram is shown in Fig. [5] for the case of two possible 
power prices (high or low). We notice that, whenever the price 
substate is changed, the substate of elapsed time is reset to 

0. Another important issue in the system state is the state 
transition probability respect to the action. Obviously, if the 
action is no inquiry, i.e. 0, the state is changed to (qi, A + 1) 
if the previous state is (qi,A). When the action is inquiry, 

1. e. 1, the state is changed to (qj,0) from the previous state 
(qi, A), where j is a random variable. We denote by Kij(A) 
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the probability of transiting from price % to price qj after A 
time slots. The transition probability is then given by 

Kij(A) = P(p A = qj\pa = Qi) 
= P(D A e Ij\p = qi) 

poo 

= / P{D A eI 3 \D ,po = q l )dD 
Jo 

= ±- J P(D A e Ij\D )dD 
= -!- [[ f(D A \D )dD A dD , (7) 



Ji J j 



where |Jj| is the length of interval Jj and f(D A \Do) is 
the conditional PDF of D A given Dq. From © and the 
assumption on the expectation and variance, it is easy to verify 
that the conditional PDF is given by 



f(D A \D ) 



cxp 



(Da-D 



2 A 



(8) 



3) Reward: Suppose that the system state of the previous 
time slot is (qi,A). If the action of the current time slot is 
0, i.e. no inquiry, the expected reward of the current time slot 
is given by (recall that the power consumption level s is a 
function of the assumed price) 



r(0) = ^K 2J (A)([/(xfe)) - qj x( qi )). 



(9) 



Otherwise, the reward is given by (recalled that c is the cost 
of communication) 



r(l) =J2 K ii(*)(U(x(qj)) - qjxfa)) - c. 



(10) 



B. Value Iteration 

Once defining the elements of MDP, we can apply Dynamic 
Programming (DP) [3] to obtain the optimal strategy. We 
denote by R(s) the optimal expected total reward when the 
initial state is s. Then, R(s) satisfies the following Bellman's 
equation [2], which is given by 



R(s) = max (r(s, a) + (3E s , a [R(s')]) , 



(11) 



where r(s, a) is the expected reward due to action a and 
system state s, E SM is the expectation conditioned on a and 
s, s' is the system state in the next time slot. The expectation 
can be computed using the state transition probability in (0. 
The instantaneous reward r(s, a) can be computed using (0 
and (flOl 

The Bellman's equation can be solved by using the follow- 
ing value iteration [3], which is given by 



R 



V ( s ) = max (r(s, a) + pE,, a [r^ 1] («')] ) 



(12) 



where the superscript t is the index of iteration. The iteration 
converges to the solution of the Bellman's equation as t — > oo. 
The optimal action is obtained from 



a*(s) = arg max (r(s, a) + f3E Sta [R{s')\) 



(13) 
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Fig. 6: The base case modified from the PJM five-bus system 
[7]. 



C. Myopic Strategy 

In the Bellman's equation ( fTTb . the optimal action based 
on the current system state should take the future reward 
into account, thus making the solution complicated. A simpler 
but perhaps suboptimal scheme is the myopic strategy, i.e. 
optimizing the instantaneous reward r without considering the 
future system state evolution. Then, the corresponding action 
is given by 



0, 
1. 



if r(0) > r(l) 
if r(0) < r(l) 



(14) 



i.e. do not inquire the power price if the reward of no inquiry 
is larger than that of inquiry, and vice versa. By applying the 
expressions of rewards in (O and ( [Tol l, we have 



0, c>j: j K lJ (A)Sr 

1, c<j: i K lJ (A)Sr 



(15) 



where ry is defined as 



Snj = U{x(qi)) - U(x(qj)) - qj{x(qi) - x(qj)), (16) 

which stands for the penalty of using the old value of the 
power price. Then, the decision rule ( fT3T > means that, when 
the cost of communication is larger than the penalty of using 
the old power price instead of the new one, the action should 
be no price inquiry; and the home appliance should inquire 
the power price, otherwise. 

IV. Numerical Results 

In this section, we use numerical simulations to explore the 
optimal power price inquiry based on the above framework. 
We use the PJM five-bus power system [7] for simulations, 
whose configuration is illustrated in Fig. [6] The corresponding 
curves of LMP versus load for the five buses are shown in Fig. 
[7] and the lower boundaries for the load intervals are given in 
Table U (note that K = 7). Both the curves and the data are 
obtained from the continuous LMP model in [5], 



A. Optimal Communication Strategy 

We assume that the upper bound of time slots between two 
inquiries is 10 time slots. We set j3 = 0.99 for the discounted 
sum of cost as the performance metric. The utility function of 
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Fig. 7: The LMP-load curve [5]. 
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the home appliance is assumed to be U(x) = 100 log a;. The 
default values of 8 and c are set to 200 and 10. 

We first use the value iteration to obtain the optimal strategy 
and the corresponding minimum discounted sum of cost. Note 
that the cost here is defined as the gap of the corresponding re- 
ward to the ideal reward when there is no communication cost. 
We also tested the performance of the strategy of inquiring the 
power price in every time slot. The ratios of the discounted 
sum of cost obtained from the optimal strategy and that of 
the always-inquiry strategy are then computed for different 
values of 8 and are shown in Fig. [8] There are five curves 
since there are five buses in the simulated power system. 
Obviously, the smaller the ratio is, the better the performance 
gain of the optimal strategy of price inquiry is. We observe 
that, for buses A, B and C, the ratio ranges between 0.6 and 
0.8, i.e. the optimization can decrease the total cost by 20% to 
40%. The performance gain of bus D is smaller. The reason 
is that the price changes the most radically for bus D, thus 
requiring more price inquiries. Since the power price of bus E 
has only marginal changes with respect to the change of load, 
it is much less necessary to inquire the price, thus making the 
performance gain of bus E much more significant than other 
buses. 

Then, we compare the performance of the optimal strategy 
with that of the no-inquiry strategy. The curves of cost ratios 
are shown in Fig. [9] In sharp contrast to Fig. [HJ the order 
of the performance gain is reversed. The performance gain of 
bus E is the least since the necessity of price inquiry is the 
least for bus E. However, even for bus E, where the price 
changes only marginally, the optimal strategy can still achieve 
a very significant gain, thus demonstrating the necessity of 
price inquiry in smart grid. 

We repeated the simulations in Figures [8] and [9] for different 
communication costs and fixed 8. From both figures, the 
impacts of increasing communication cost on the performance 
gain are contrary. As the communication cost increases, the 
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Fig. 8: The ratio of cost between the optimal strategy and the 
always-inquiry strategy with different 8. 
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Fig. 9: The ratio of cost between the optimal strategy and the 
no-inquiry strategy with different 8. 



home appliance should be more inclined not to inquire the 
power price to reduce the cost of communication. Therefore, 
the performance gain in Fig. [10] increases while that in Fig. 
rm is decreased. 

B. Myopic Strategy 

In Figures Q~2] and [T3] we compare the performance of 
the optimal strategy and the myopic strategy with different 
8 or different communication costs. In the ratio of cost, 
the numerator is the cost of the optimal strategy while the 
denominator is that of the myopic strategy. We observe that 
the myopic strategy is very close to optimal. Therefore, in 
practical systems, one may consider the myopic strategy due 
to its simplicity. 

V. Conclusions 

We have considered the communication cost in smart grid, 
which is often omitted in existing studies. The dynamics 
of power price have been modeled as a Markov chain by 
modeling the random process of load as a Brownian motion 
like one and employing the LMP-load mapping curve. Then, 
the decision of power inquiry has been considered as a MDP 
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Fig. 10: The ratio of cost between the optimal strategy and the 
always-inquiry strategy with different communication costs. 
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Fig. 11: The ratio of cost between the optimal strategy and 
the no-inquiry strategy with different communication costs. 
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Fig. 13: The ratio of cost between the optimal strategy and 
the myopic strategy with different communication costs. 

problem and dynamic programming is employed to compute 
the optimal strategy. To avoid the high computational cost, we 
have studied a simple and suboptimal myopic strategy. A PJM 
five-bus system has been used for numerical simulation, which 
shows significant performance gain of the optimal strategy of 
price inquiry, as well as the near-optimality of the myopic 
approach. 
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Fig. 12: The ratio of cost between the optimal strategy and 
the myopic strategy with different 8. 
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